home *** CD-ROM | disk | FTP | other *** search
Text File | 1994-11-30 | 30.2 KB | 610 lines | [TEXT/ALFA] |
- Contents:
- ABOUT MPEG
- ABOUT CONVERSION TO QUICKTIME
- MISC SIZE AND TIMING NOTES
- ABOUT THE THREAD USAGE
- ABOUT THE RESOURCES.
- TIMING FOR WAITNEXTEVENT() AND FRIENDS
- ACCURATE TIMINGS FOR SPARKLE.
- ABOUT THE MPEG ENCODING ALGORITHMS
-
- This file contains various pieces of informationa about MPEG and Sparkle.
- Parts of it are notes I take for myself as I alter and test the program.
- I included them here because I thought some other mac programmers out there
- might be interested in such things. Read through what you care about and
- understand and ignore the rest.
-
-
- -------------------------------------------------------------------------------
- ABOUT MPEG
-
- MPEG is an international standard for video compression. It compresses
- frames at two levels. Firstly frames are compressed internally, and
- secondly frame differences are compressed rather than transmitting full
- frames.
- To understand MPEG one should first understand JPEG. MPEG uses the same
- ideas as JPEG for much of its compression, and suffers from the same
- limitations.
- JPEG compression begins by changing an image's color space from RGB
- planes to YUV planes, where Y is the luminance (brightness of the image)
- and U and V store color information about the image. Half the color
- resolution in the horizontal and vertical planes is dropped, because the
- eye is less sensitive to color than to brightness. These YUV planes are
- then divided into 8 by 8 blocks of pixels. The 64 coefficients in this 8
- by 8 block are fourier transformed concentrating the energy of the image
- in a few coefficients at low frequencies. The high frequency terms of the
- fourier transform can be discarded as the eye is not sensitive to them.
- The resultant fourier transform coefficients are then encoded using a
- variable length coding scheme (basically Huffman coding) so that
- frequently occuring patterns of coefficients are transmitted in few bits
- and rare patterns transmitted in many bits.
-
- MPEG goes beyond this by adding support for inter-frame compression. This
- compression works by realizing that most video consists of foreground
- objects moving over a largely static background. Thus rather than
- transmit the foreground and background pictures over and over again, what
- is transmitted is the motion of the foreground objects. Note that this is
- different from the way QuickTime does interframe compression. What
- QuickTime does is just to subtract the two images from each other and
- compress the resultant image (which is hopefully largely blank space.)
- The MPEG scheme gives much better compression, but is much harder to
- program. It is essentially a pattern recognition problem, looking at a
- set of frames and deciding what pixels in the frame correspond to moving
- objects---the sort of thing humans are very good at and computers very
- bad at. For this reason, a complete MPEG compressor is complex and very
- slow.
-
- MPEG movies consist of three types of frames. I-frames are like JPEG
- images and are compressed by themselves. P-frames are compressed based on
- motion relative to a previous frame. B-frames are compressed based on
- motion relative to both a previous frame AND a future frame. How do you
- know what the future frame is? Well the MPEG data is not stored in the
- same order as it is displayed. You have to decode future frames before
- the B-frames on which they depend, then buffer the future frame
- somewhere. This is why MPEG players need rather more memory than
- QuickTime players.
-
- As an example, here's a comment from my code.
- //About these three counters:
- //DecodedFrameNumber tells where we are in the file which we are currently
- //parsing, and is needed to find one's way around this file. It is incremented
- //every time a new frame is parsed.
- //DisplayedFrameNumber gives the number in temporal sequence of the frame that
- //is currently being shown on the screen.
- //For I and P MPEGs these are the same, but not for B MPEGs. For example a
- //B MPEG may have the sequence:
- // 0 1 2 3 4 5 6 7 8 9 10 decodedFrameNumber
- // I P B B I B B P B B I frameType
- // 0 3 1 2 ! 2 0 1 5 3 4 ! 2 display number (within group)
- // --------------|-----------------------|---- group boundaries
- // 1 4 2 3 7 5 6 10 8 9 ? displayedFrameNumber
- //Note how the frames are clustered in groups, within which the Pict structure's
- //temporalReference field give the display number within that group.
- //The displayedFrameNumber is basically a sum of these as one passes from group
- //to group, along with a condition of starting at one, rather than zero.
- //Now consider random access:
- //If we want to make a random access jump to a frame around displayed frame 5,
- //we will be vectored to decodedFrameNumber 4, which will then be decoded,
- //skipping past decodedFrameNumbers 5 and 6 (which depend on another frame in
- //addition to decodedFrameNumber 4, and hence can't be displayed) to finally
- //arrive at displaying decodedFrameNumber 4 as displayedFrameNumber 7.
- //the variable decodedFrameNumberOfVisibleFrame keeps track of this fact that
- //the displayedFrameNumber 7 actually represents decodedFrameNumber 4.
- //This information is necessary when stepping backwards through an MPEG.
- //If we are at displayedFrameNumber 7 and step back, we will look back for I-frames
- //until we get to the I-frame at decodedFrameNumber==4. But this is the I-frame of
- //the image we are just displaying, so we actually need to then step back to an
- //earlier I-frame.
- //This complication is all necessary partially because of the way MPEG forward
- //coding works, with the frame sequence on file not corresponding to the viewed
- //sequence, also partially because some B MPEGs do not have valid data for
- //their Pict.temporalReference fields, thus one cannot rely on that field to be
- //valid but one has to maintain a state machine as one parses through the file.
-
- An MPEG movie can consist of only I-frames. This will be far from
- optimally compressed, but is much easier to encode because the pattern
- recognition is not needed. Such a movie is pretty much what you would get
- if you made a QuickTime movie and used the JPEG codec as the compression
- option. Because the I-frame movie is so much easier to calculate, it is
- much more common. Sparkle checks if a movie uses only I-frames and if so
- reduces its memory requirements since such movies do not need complex
- buffering. In the PC world, many people talk about XING type MPEGs which
- are pure I-frame MPEGs. These are produced by XING hardware on PCs and
- played back using the XING MPEG player.
-
- One problem with the MPEG standard is that many vendors seem to feel
- which parts of it they support are optional. XING, for example, often
- does not ends its MPEGs properly. It does not start frame numbering
- properly, and does not correct frame numbering after MPEGs are edited.
- GC technologies produces MPEGs that have the frames essentially random
- numbered, and has garbage frames at the start of its MPEGs.
- Wherever possible I have tried to adapt my code to common pathologies in
- MPEG encoders.
- I have also built in powerful yet computationally cheap
- error-detection and recovery. For example a recent MPEG posted to usenet
- drew widespread complaints because some of the uuencoded text was garbled
- and the resultant MPEG crashed pretty much every decoder out there. But
- Sparkle noticed the error and went on quite happily. Sparkle has also
- proved quite robust in the face of MPEGs I have deliberately corrupted.
- If you come across any MPEG file that causes Sparkle to crash or produce
- garbage, I WANT TO KNOW ABOUT IT. With a copy of the file, I can trace
- through Sparkle, find just what causes the crash, and make Sparkle even
- more robust.
-
- For more details on MPEG, read the MPEG FAQ on USENET. It is posted once
- a week to the picture groups and to news.answers.
-
- ------------------------------------------------------------------------------
- ABOUT CONVERSION TO QUICKTIME
-
- The following are notes I've made on conversion to QuickTime. I have
- investigated this issue extensively, but not exhaustively. If someone has
- comments on the subject---more extensive notes than I have, corrections,
- whatever, please tell me.
-
- All times I give are on my SE/30 with a 32-bit screen. People should
- extrapolate to their machines---I guess LC IIs are about half as
- fast and Centris/Quadras three to six times as fast.
-
- The useful codecs are video, cinepak (used to be compact video) and jpeg.
- JPEG compression at normal quality gives files of very good quality and not
- much larger than pure I-frame MPEGs. A 120x160 image can play back at about
- 4fps. Translated to an 040 and you get a useful frame rate. However JPEG
- has a major problem in that when it decodes to a 32bit screen, it draws
- directly to the screen, not to an offscreen Gworld unlike other codecs.
- This produces movies with obvious tearing artifacts. When fast-dithering is
- used to draw to other screen depths, it works fine. I don't understand why
- this problem with 32 bit screens should be the case, but I have told Apple
- about this problem and maybe it'll be fixed in a later release of
- QuickTime. Meanwhile write to Apple and complain---they are holding back a
- useful capability.
-
- With the video and cinepak compressors, it is very important to check the
- key-frame rate checkbox. Key-frames are like MPEG I-frames. They are
- compresed standalone and do not depend on other frames. The other frames
- produced by the movie codecs depend on previous frames. Setting the
- key-frame rate guarantees that at least that rate of key-frames (one
- frame in used. for example) will be used. Checking the key-frame rate
- checkbox allows the movie to use intra-frame compression (ie not just
- key-frames) and gives movies half as small as they would otherwise be.
- The lower you set the key frame rate to (this means a larger number in
- the QuickTime saving options dialog box) , the smaller you movie will be.
- For example a 72K MPEG (48 frames, 120x160, pure I-frame) became a 290K
- movie without keyframes, a 160K movie with a key-frame rate of 1 in 8,
- and a 138K movie with a key-frame rate of 1 in 96.
- The price you pay for a low key-frame rate is that the movie has more
- difficulty when playing backwards, or when randomly jumping around. I
- don't find it a problem and usually use a key-frame rate of about 1 in
- 100, but try for yourself to see what things are like.
- Video gives better quality results when a higher key-frame rate is used.
- Strangely cinepak appeared to give lower quality results (as well as a
- larger movie) when more key-frame were used.
- I'll have to investigate this further---I may have become confused when I
- was making the measurements. Anyone want to confirm or deny this?
- (For comparison, this same movie became a 90K JPEG movie.)
-
- I find video and cinepak give much the same file sizes at the same
- (around normal) quality setting. The cinepak file is consistently a
- little larger, but not enough to matter. The video file is consistently
- lower quality for the same size as the cinepak file. However the video
- low quality artifacts (blocks of solid color) I find less psychologically
- irritating than the cinepak low quality artifacts (general fuzzing of
- borders like everything is drawn in crayon and watercolor).
- However cinepak has the advantage of playing back much faster than video.
- For a 120x160 image on my 32bit screen, I can get smooth playback with
- cinepak at 24fps. Video can do smooth playback up to about 16 fps.
-
- Fast dithering seems to be a good job for speed (at the cost of quality).
- Unlike earlier versions of QuickTime, with 1.6.1 I found the same speed
- of playback (ie same degree of skipping frames or not) at every screen
- depth but 2 bit depth.
-
- Cinepak can support a largish MPEG to QuickTime movie (352x240) at 6fps
- on my mac, but no faster.
-
- Compression using cinepak is SLOW SLOW SLOW. A 120x160 frame takes about
- 10 seconds to compress. A 352x240 frame takes about a minute. In this
- time your mac is stuck---it looks like it has crashed. Don't start saving
- to cinepak QuickTime unless you are prepared to walk away from your mac
- and not touch it until it's done.
- QuickTime 1.5 did not include anyway to do this compression in small
- chunks so that it would run nicely in the background. I received word
- today that QuickTime 1.6 does have this capability, so once I get the
- relevant techincal documents and read them, I will add this ability.
-
- See the WHY DOESN"T SPARKLE DO... section for more information about MPEG
- frame rates and their relationship to QuickTime frame rates.
-
- ================================================================================
-
- MISC SIZE AND TIMING NOTES
-
- These are rough notes I take as I alter the code, partially out of interest
- and partially to guide me in where I need to change things.
- They may be of interest to some of you.
- They are timed on my new Q610 The timings may not be be consistent with
- each other as they reflect the state of the code at different times. In
- between I may change the code quite a bit---mainly of interest are the
- differences within any group of results.
-
- Timings for Erika1.MPG under Sparkle 2.0 on my Q610.
- This is a 41 frame pure I 120x160 frame MPEG.
-
- These times are for a version of the code that does not call WNE while playing:
-
- 1) Effect of the screen depth/dithering on times: (non-optimized code)
- 24-bit color: 8.2 s
- 16-bit color 9.2 s
- 8-bit color 10.1 s
- 8-bit grey 8.6 s
- Conclusion:
- probably worth adding hints to speed up some parts of the code to compensate
- for the dithering times:
- 1) For 8 bit color use 4x4 IDCT.
- 2) For 8 bit grey, omit YUV->RGB conversion.
- 3) For 16 bit color, use a special YUV->RGB conversion.
-
- 2) Effect of various TC6 optmizations: (24-bit screen)
- Defer and combine stack adjusts: 7.8 s
- Suppress redundant loads: 7.7 s
- Automatic register assignment: 7.6 s
- Global:
- Induction variables: 7.4 s
- Common sub-expression elimination: 7.3 s
- Code motion: 7.2 s
- Register coloring: 6.6 s
-
- 3) Effects of various displayings: (no optimizations)
- No progress proc at all (implies NOTHING ever updated on screen): 6.7 s
- Progress proc called but does nothing: 6.8 s
- Progress proc updates movie controller/text track only: 7.6 s
- Progress proc updates only MPEG frames, not movie controller 7.3 s
- Progress proc updates both: 8.1 s
- Conclusion:
- of the 8.1 s, 0.8 s=10% is used updating movie controller and
- 0.5 s= 6% is used updating the MPEG frames.
-
- 4) Effect of the time allowed a thread before it yields:
- Yield time=6000 ticks (ie never yield) 8.0 s
- 180 ticks 8.1 s
- 60 ticks 8.2 s
- 20 ticks 8.6 s
- One would rather have a 20 tick time than a 60 tick time for increased
- user interactivity, but the time cost is rather stiff.
- However by implementing a new thread scheduler, I should be able to reduce
- this cost somewhat.
-
- 5) Effect of yield time in the background:
- We convert Erika1.MooV to an I-frame MPG.
- FG time (yield time of 30 ticks): 1min 12s
- BG time (yield time of 10 ticks) 2min 30s
- BG time (yield time of 30 ticks) 2min 04s
- Conclusion:
- The longer yield time is obviously better but makes things more choppy.
- Best is probably to implement a timer keeping track of how fast we are
- getting background NULLs and increasing bgYieldTicks as we notice less
- fg activity.
-
- 6) Note:
- I have tried to put yield brackets around all the hotpoints of the code to
- make it run well in background. The main problem for now, that I need to work
- around (ProgressProcs ?) is when the new frame is requested for coding an MPEG
- or QT movie from a QT document. The fiddling that goes on to obtain this frame
- can be fairly substantial, taking as long as 70 or 80 ticks for a simple
- 160x120 movie. My guess is that QT doesn't do very smart caching about
- non-synch frames and has to decompress a long sequence to get to these frames.
- Anyways, because of this we're stuck with a basic jerkiness at that
- granularity for now.
-
- 7) Effects of four different P algorithms.
- We convert Erika1.MooV to four MPEGs, all using a PPPI pattern,
- with an I-quantization of 8 and a P-quantization of 10.
- Algorithm: Time: Size:
- Logarithmic 1:45 min 53 953
- Two level 2:45 min 54 328
- Subsample 3:45 min 54 765
- Exhaustive 5:55 min 54 677
- There was no obvious difference in quality between these MPEGs (and they
- were all pretty lousy). Thus there seems no real advantage to using anything
- but the fastest algorithm.
-
- 8) Effects of P-quantization.
- Evene with a P-quantization of 8, the above setup does not produce as good
- an image as a pure I sequence (although the file size of 62K is much smaller.)
- This appears to be largely due to the successive dependencies caused by
- the three successive P frames.
- Is it better to reduce the number of Ps or lower the P-quantization?
- Using same pattern but P-quantization of 4 gives a file size of 98K and
- a quality lower than the pure I-frames (though certainly better than what
- we had). Using a pattern of PPI and P-quantization of 8 gives a file size of
- 71K and the same sort of quality.
-
- Using a PBBIBB pattern and all quantizations as 8 gives a size of 60K and
- the same sort of quality.
-
- Conclusions:
- 1) I need to use a higher quality source of images to investigate these
- affects.
- 2) I think the P and B pattern matching criteria may be a bit dodgy, or maybe
- some part of my code has problems with half-pixels or such.
-
- 9) Effect of buffer size.
- I played a 750K MPEG of 150 frames. With a buffer size of two frames, it took
- 36s. With a buffer size of 200 frames (ie entire movie) it took 33s. Thus
- the larger buffer buys about 10% speed.
- So maybe, when time, create massive buffers which are in some way shrinkable.
- ----------------------------------------------------------------------------------
-
- Sizings for Erika1.MPG under Sparkle 2.0
-
- 1) Using only I-frame encoding with varying I-quantization:
- I-quantization size in bytes
- 1 237 307
- 2 179 960
- 4 132 916
- 8 92 210
- 16 66 821
- 24 42 658
- //These two values are bogus, now that I've cleaned up the MPEG generating
- //code.
- // 32 37 094
- // 64 25 955
- DC terms only 21 695
-
- Notes:
- • These sizes are probably slightly larger than necessary as at present I do not
- pad the excess pixels where frame size is smaller than the frame size in
- macroblocks, thus the DCT is encoding crud at those borders. By padding those
- to DC we'll get a small shrinkage in size.
- ! This was fixed in version 2.01. The shrinkage was way more than I
- expected, of the order of 15%.
- • With this set of images (which were pretty lousy to begin with) a quantization
- level of 8 produced acceptable images, while a level of 16 produced
- unnacceptable quality.
- ================================================================================
-
- ABOUT THE THREAD USAGE
-
- I have nothing special to say about using threads except that I recommend
- all serious Mac coders read the Apple documentation and use them. They
- make life so much easier. The 1.x code was full of the most ghastly and
- convoluted logic to enable breaking out of the MPEG decoder into the main
- event loop and back again. However the 2.x code for encoding is ridiculously
- simple. We simply have the encoder, written (like a UNIX process or such)
- as on long loop, then in the loop at appropriate points we make Yield()
- calls.
-
- The one thing that one has to be careful of is using exception handling in
- the TCL. Because this is based on application wide globals, dreadful
- things can happen in your thread when an exception occurs, a string of
- CATCH handlers is followed up the stack, and at some point you leave the
- stack of the thread and enter the "stack of the application". My solution
- to this was to use custom thread context switchers which, on every context
- switch, swap the appropriate exception handling globals.
- The custom context switchers also become a good place for updating the
- timings of each thread and setting when it will next yield.
-
- At present I'm only using cooperative threads. It's not clear to me that
- switching to pre-emptive threads is a useful excercise. One problem is, of
- course, that pre-emptive threads make life rather trickier and coding more
- complex. More to the point, pre-emptive threads only get half the CPU
- time, while the WaitNextEvent() loop gets the other half. So by switching
- to them I'd get lose half my speed, and not gain much. I might gain
- slightly smoother user event support, especially in the background, but
- that's not that bad right now and will improve when I install a custom
- thread scheduler in place of the hokey quick kludge I'm using right now.
- If anyone out there has worked with pre-emptive threads and has opinions
- on them one way or the other, please let me know.
-
- A second major change in the 2.x code is I have now structured things
- around a model of video source objects and video encoder objects, with any
- video source able to be linked to any video encoder.
- This makes for very orthogonal extensible code.
- The natural extension of this is now to define more video sources. In
- particular as soon as I can I hope to get to work on morphing routines,
- with output that can be played to screen or saved in whatever video
- formats I'm supporting by that stage. I have some ideas for morphing
- algorithms, but if anyone can send me code, or tell me whence I can ftp it
- (yes this usage of whence is correct) I'll obviously be able to get going
- faster. Along the same lines, anyone know where I can get AVI source, or
- the AVI specs so I can add AVI support?
-
- UPDATE FOR 2.1
-
- The connection between threads is now based on a message queue associated
- with each thread. When a message is passed to a thread it is enqueued.
- If a thread is busy (ie playing or saving to some format) and asks for
- messages when there are none it is given a NULL message which it uses to
- perform idle processing, otherwise it is put to sleep. Obviously this
- mechanism looks very like the Process Manager's behavior.
- Two consequences emerge from this.
- The first is that I can now, in the main event loop, peek for events and
- if there are no events in my main event queue, return immediately. This
- allows me to avoid the overhead of (very expensive) main event loop while
- maintaining high interactivity. The cost of high interactivity is thus
- reduced from about 12% of play time to about 1%.
- The second is that it makes it much easier to glue the user interface to a
- different MPEG encoder or decoder (eg dedicated hardware) because the
- connection between the user interface and the threads doing the work is
- asynchronous.
- ================================================================================
-
- ABOUT THE RESOURCES.
-
- The default for the flags for all resources is purgable.
- However there are some exceptions.
- • DLOGs and DITLs that will be opened by the TCL needed to be set nonpurgable
- because of a bug in the TCL. I have altered CDLOGDialog to fix this and
- these resources can now be purgable.
- • The DLOGs and DITLs used by CustomGetFile() and CustomPutFile() appear to
- need to be non-purgable even though IM says they can be purgable. If they
- are set purgable the MemHell INIT registers errors during CustomGetFile()
- and CustomPutFile().
- • Menus may not be purgable because the menu manager does not allow that.
- Given this, one might as well make them preload and locked.
- Likewise for the MBAR.
- • The DaTa resources, used to construct decoding tables, are mostly preload
- locked and treated just liked const global data. However there are a few
- small tables for which it is advantageous to load these into genuine global
- arrays. For that case, the resources are marked purgable.
- • Marking resources Protected does not ever seem to be useful.
-
- • If a dialog/alert makes uses of an ictb to change the font of text items,
- the associated dctb or actb must also be present or else nothing will
- happen.
- • Note that some of the dctb/itcb resources may appear redundant. However they
- prove to be necessary in unexpected ways. For example if they are not
- present for the CustomPutFile() DLOG, the dialog box drawn on screen will
- use dotted grey pattern to draw the items in the scrolling list of files,
- rather than using a nice grey color.
- ================================================================================
-
- TIMING FOR WAITNEXTEVENT() AND FRIENDS
-
- I wrote a simple loop that timed 200 passes each of these calls, and
- recorded the times. For my outer loop over events, I want to know the cheapest
- way of ascertaining whether I should get an event or not:
- For the last item, we timed 200 loops over the TCL core event routine,
- CApplication::Process1Event(), with nothing happening.
-
- Time (in ticks) for 200 passes of:
- EventAvail() 30
- GetNextEvent 62
- WaitNextEvent0 93 (WNE with mouseRegion==NULL and sleep==0)
- WaitNextEvent1 493 (WNE with mouseRegion==NULL and sleep==1)
- Loops over TCL Idle 1027
- ================================================================================
-
- ACCURATE TIMINGS FOR SPARKLE.
-
- These are timings taken for Sparkle 2.1 on my Q610 with 24bit color. The idea was
- to accurately time what were the hot spots in MPEG playback.
- All times reported are in ticks (60th of a second) and reflect the second time
- the operation was performed. Very consistently it was found that the first time
- an operation was performed took 11 ticks longer than subsequent times, presumably
- reflecting loading in purgable resources and such initialization.
-
- Times to play Erika1.mpg. All times reflect playback time for 40 frames.
- The first time is split into times with debugger and without.
- Once the program never calls WNE the debugger time becomes the same as the app time.
-
- Standard parameters, thread time quantum=20 ticks,
- delay before yielding to other apps via WNE was infinity
- When debugging time was 433 ticks 5.5 fps
- As an application 402 ticks 6.0 fps
-
- If we set an infinite yield time
- 399 ticks 6.0 fps
- // All subsequent times use an infinite yield time.
- If we don't use a progress proc
- 370 ticks
-
- If we use a progress proc but it doesn't update the screen
- 373 ticks
-
- If we update the screen but don't use a movieController
- 332 ticks
-
- //All subsequent times use an infinite yield time and no movie controller.
- If we omit YUV to RGB conversion
- 225 ticks
-
- If we use only a DC term IDCT
- 229 ticks
-
- If we don't even call IDCT
- 226 ticks
-
- If we don't call ReconstructIBlock()
- 295 ticks
-
- If we use a 4x4 pseudo IDCT (very rough---can be written to be faster)
- 320 ticks
-
- If we use QUALITY_STANDARD not QUALITY_HIGH
- 324 ticks
-
- If we use QUALITY_LOW not QUALITY_HIGH
- 294 ticks
-
-
- From this we conclude that for 40 frames:
- WNE/thread yield overhead=3 to 4 ticks in an app, but about 35 ticks when debugging.
- (This is nice---it means my scheme for drastically cutting WNE time works!)
- Progress proc function call overhead= 3 ticks.
-
- CopyBits to 24bit screen= 25 ticks
- MovieController overhead= 40 ticks
- YUV to RGB =105 ticks
- IDCT =100 ticks
- Reconstruct I blocks = 30 ticks (does cropping of the results)
- So Huffman = 95 ticks
-
- From these results we can expect to shave 15 ticks off the IDCT by using 4x4.
- We can shave off 30 ticks by using QUALITY_LOW.
- We can probably cut YUV to RGB in two or more by using the upper 5 bits of each
- of YUV into a table.
-
-
- Some further results are:
-
- Suppose we simply run the movie controller/text track and don't ever call
- the MPEG code. This takes about 55 ticks.
- Suppose we hide the movie controller: 422 ticks (cf 433 ticks)
- text track : 418 ticks
- both : 408 ticks
-
- The above timings largely correlate with the 2.1 code but fail to reflect
- a few changes I've made since they were taken. Sometime I'll update them
- again.
- -------------------------------------------------------------------------------
-
- ABOUT THE MPEG ENCODING ALGORITHMS
-
- Here is a little info about the MPEG encoding algorithms.
-
- First the P-search algorithms.
- In these algorithms, we have a given macroblock (a 16x16 block of pixels) and we
- wish to search the previous frame to find the best match to this macroblock.
-
- Logarithmic search works by dividing a search range ( a given area of pixels, say
- 20x20 in size) into nine squares, selecting the center of each square, and testing
- how well the area around that center matches the current macroblock. The area of the
- center that matches best is itself divided into nine and so on until one can go no
- further. This is a very fast search technique. Without halfpixels enabled, it takes
- 25 compares per macroblocks. With halfpixels enabled, it takes 33 searches per
- macroblock. Thus halfpixels are not much more expensive, and substantially lower
- the size of the resultant file (by 5 to 10%).
-
- Two level search works by searching all the possible displacements in the search
- range that are (even if halfpixels are enabled) or (a multiple of four if
- halfpixels are not enabled). The site with the best match is then tested at
- all the eight sites around it and the best of the nine chosen.
- This takes 108 searches per macroblock without halfpixels, and 408 if halfpixels are
- enabled.
-
- Exhaustive search searches every acceptable motion in the search range to find the
- best match. It takes 400 searches without halfpixels on, and 1600 with half pixels
- on.
-
- Now the B algorithms.
- All of these algorithms use as their starting point whatever P-algorithm you have
- chosen and work with that as a basic block.
-
- The simple algorithm does two P-searches, searching the current macroblock against
- the past picture and against the future picture. The best past motion vector and
- the best forward motion vector are used to calculate the interpolating frame,
- and the best of forward, backward and interpolating motion is calculated.
-
- Cross2 does four P-searches. The first two proceed as in Simple search. Then, using
- the forward motion vector from the simple search, a P-search is made of the future
- picture to find the best interpolation forward vector. Likewise using the forward
- vector from the simple search, a P-search is made of the past picture to find the
- best interpolation backward vector. The best match of all four searches is used.
-
- Finally exhaustive search performs a wopping 400 P-searches per macroblock, 1600
- if half-pixels are on. Each of those searches uses the current forward motion vector
- to search for a best interpolating backward motion vector.
-
- -------------------------------------------------------------------------------
-